Skip to content

Conversation

@Sterling-Augustine
Copy link
Contributor

No description provided.

joker-eph and others added 30 commits July 29, 2025 10:34
)

This prints the prefix on every new line, allowing for an output that
looks like:
```
[dead-code-analysis] DeadCodeAnalysis.cpp:288 Visiting operation: func.func private @private_1() -> (i32, i32) {
[dead-code-analysis] DeadCodeAnalysis.cpp:288   %c0_i32 = arith.constant 0 : i32
[dead-code-analysis] DeadCodeAnalysis.cpp:288   %0 = arith.addi %c0_i32, %c0_i32 {tag = "one"} : i32
[dead-code-analysis] DeadCodeAnalysis.cpp:288   return %c0_i32, %0 : i32, i32
[dead-code-analysis] DeadCodeAnalysis.cpp:288 }
[dead-code-analysis] DeadCodeAnalysis.cpp:313 Visiting callable operation: func.func private @private_1() -> (i32, i32) {
[dead-code-analysis] DeadCodeAnalysis.cpp:313   %c0_i32 = arith.constant 0 : i32
[dead-code-analysis] DeadCodeAnalysis.cpp:313   %0 = arith.addi %c0_i32, %c0_i32 {tag = "one"} : i32
[dead-code-analysis] DeadCodeAnalysis.cpp:313   return %c0_i32, %0 : i32, i32
[dead-code-analysis] DeadCodeAnalysis.cpp:313 }
```
…clone" (llvm#150856)

Reverts llvm#150735 due to bot failures that I need to
investigate
When Mips process emitStartOfAsmFile and updateABIInfo, it did not know
the real value of IsSoftFloat and STI.useSoftFloat(). And when inline
asm instruction was empty, Mips did not process asm parser, so it would
not do TS.updateABIInfo(STI) again and at this time the value of
IsSoftFloat is correct.

Fix llvm#135283.
This is a follow up of 9e79991. It
attempts to fix the CI flakes we saw on fuchsia-linux-x64 builder.
Tracked at llvm#112294

This patch implements from [basic.link]p14 to [basic.link]p18 partially.

The explicitly missing parts are:
- Anything related to specializations.
- Decide if a pointer is associated with a TU-local value at compile
  time.
- [basic.link]p15.1.2 to decide if a type is TU-local.
- Diagnose if TU-local functions from other TU are collected to the
  overload set. See [basic.link]p19, the call to 'h(N::A{});' in
  translation unit llvm#2

There should be other implicitly missing parts as the wording uses
"names" briefly several times. But to implement this precisely, we have
to visit the whole AST, including Decls, Expression and Types, which may
be harder to implement and be more time-consuming for compilation time.
So I choose to implement the common parts.

It won't be too bad to miss some cases since we DIDN'T do any such
checks in the past 3 years. Any new check is an improvement. Given
modules have been basically available since clang15 without such checks,
it will be user unfriendly if we give a hard error now. And there are
a lot of cases which violating the rule actually just fine. So I decide
to emit it as warnings instead of hard errors.
* Replace undef -> poison
* Remove overloaded type in intrinsic signature
Vector costs without zvfhmin/zvfbfmin and zfhmin/zfbfmin are somehow
cheaper than with zvfhmin/zvfbfmin at smaller vector sizes, despite the
fact that the former are scalarized to libcalls. This adds a RUN line to
showcase this, splitting out the bfloat tests into their own functions
so we don't have duplicate lines for the regular float/double costs.
If we have instructions in second loop's preheader which can be sunk, we
should also be adjusting PHI nodes to receive values from the fused loop's latch block.

Fixes llvm#128600
Those are metadata sections for ELF but was not properly set for COFF.
I'm just trying to more consistently use CHECK-SD and CHECK-GI.
This relands commit
llvm@7355ea3.

The original commit was reverted in
llvm@bfd73a5
because it was breaking the buildbot.

The issue has now been resolved by
llvm@38f8253.

Original PR: llvm#139348
Original commit message:
<blockquote>

closes clangd/clangd#1037 
closes clangd/clangd#2240

Example:

```c++
class Base {
public:
  virtual void publicMethod() = 0;

protected:
  virtual auto privateMethod() const -> int = 0;
};

// Before:
//                        // cursor here
class Derived : public Base{}^ ;

// After:
class Derived : public Base {
public:
  void publicMethod() override {
    // TODO: Implement this pure virtual method.
    static_assert(false, "Method `publicMethod` is not implemented.");
  }

protected:
  auto privateMethod() const -> int override {
    // TODO: Implement this pure virtual method.
    static_assert(false, "Method `privateMethod` is not implemented.");
  }
};
```


https://github.com/user-attachments/assets/79de40d9-1004-4c2e-8f5c-be1fb074c6de

</blockquote>
This effectively reverts a4d4859 which
didn't fix the problem that `int*,` was not counted as "Left" alignment.

Fixes llvm#150327
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.

Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.

This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.

Fixes llvm#110753.
Fixes llvm#121390.
This is a nomination for the maintainers of the core category within
MLIR as proposed in
https://discourse.llvm.org/t/mlir-project-maintainers/87189. As agreed
in the Project Council meeting on July 17, we are proceeding with
category nominations without waiting for lead maintainers to be
nominated.
)

In llvm#149156, I ensured that we
no longer generate spurious `tensor.empty` ops when vectorizing
`linalg.unpack`.

This follow-up removes leftover code that is now redundant but was
missed in the original PR.
This is a nomination for the maintainers of the tensor compiler category
within MLIR as proposed in
https://discourse.llvm.org/t/mlir-project-maintainers/87189. As agreed
in the Project Council meeting on July 17, we are proceeding with
category nominations without waiting for lead maintainers to be
nominated.
This isn't needed after we set the tail folding style to data-with-evl
via TTI in llvm#148686.  Also rename the tests to reflect the fact they're
no longer forcing the tail folding style.
Those sections are generated by -fembed-bitcode and do not need to be
kept in executable files.
timblechmann and others added 29 commits July 29, 2025 10:35
The SetThreadInformation API allows threads to be scheduled on the most
efficient cores on the most efficient frequency.
Using this API for ThreadPriority::Background should make clangd-based
IDEs a little less CPU hungry.

---------

Co-authored-by: Alexandre Ganea <[email protected]>
This PR adds support for `-ffine-grained-bitfield-accesses`. I reused
the tests from classic CodeGen, available here:

[https://github.com/llvm/llvm-project/blob/c2c881fcc85e0c2d7a050b0199d4dadf8f556b9e/clang/test/CodeGenCXX/finegrain-bitfield-access.cpp](https://github.com/llvm/llvm-project/blob/c2c881fcc85e0c2d7a050b0199d4dadf8f556b9e/clang/test/CodeGenCXX/finegrain-bitfield-access.cpp)

We produce almost exactly the same codegen, except when returning a
variable: we emit an extra variable to hold the return value, whereas
classic CodeGen does not. Also, the GEP instructions use slightly
different syntax compared to classic CodeGen.
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates symbols that were recently
added to LLVM without proper annotations. The annotations currently have
no meaningful impact on the LLVM build; however, they are a prerequisite
to support an LLVM Windows DLL (shared library) build.

## Background

This effort is tracked in llvm#109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

## Overview

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS:
- Add `LLVM_EXPORT_TEMPLATE` and `LLVM_TEMPLATE_ABI` annotations to
explicitly instantiated instances of `llvm::object::SFrameParser`.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This patch adds a new variety of driver to Dexter, allowing it to work
with DAP-based interfaces for debuggers. The first concrete instance of
this is implemented in this patch, adding support for an `lldb-dap`
debugger. This is functionally very similar to the existing LLDB
debugger support*, but uses lldb-dap as its executable instead of lldb.

This has been tested successfully against the existing feature_test
suite, and manually tested against some other inputs; support is
essentially complete, although any further DAP-based debuggers may
require additional hooks inserted into the base class to deal with any
idiosyncrasies they exhibit (as with the several that have been inserted
for lldb-dap).

NB: There are some small differences resulting from differences between
lldb-dap's use of the lldb API and Dexter's use in its lldb driver; one
small example of this is when evaluating variables, lldb-dap will try to
first use `GetValueForVariablePath` and fallback to `EvaluateExpression`
if necessary, while Dexter will always use `EvaluateExpression`; these
can give slightly different results, resulting in different output from
Dexter for the same input.
llvm#150443)

Update docs to state that reduction is supported on OpenMP `loop` and
`teams` standalone and compound constructs.
This PR introduces the initial version of a C++ framework for the
conformance testing of GPU math library functions, building upon the
skeleton provided in llvm#146391.

The main goal of this framework is to systematically measure the
accuracy of math functions in the GPU libc, verifying correctness or at
least conformance to standards like OpenCL via exhaustive or random
accuracy tests.
This adds standard-comforming handling for calls to functions that were
declared in C source in the no prototype form.
This PR implements fabsbf16 math function for BFloat16 type along with
the tests.

---------

Signed-off-by: krishna2803 <[email protected]>
Signed-off-by: Krishna Pandey <[email protected]>
Co-authored-by: OverMighty <[email protected]>
This adds support for array cleanups, including the ArrayDtor op.
My aim here is to make these a little easier to maintain by relying on
aliases where these instructions overlap with the Hint instructions they
are based on.

The following instructions have not been converted to aliases as they
have complex mappings from ther immediate encodings to the immediate
encoding of the underlying instruction (setting high bits):
- qc.pputci
- qc.sync, qc.sync, qc.syncwf, qc.syncwl
- qc.c.sync, qc.c.syncr, qc.c.syncwf, qc.syncwl

Co-authored-by: Sudharsan Veeravalli <[email protected]>
…#149394)"

This reverts commit 83dfdd8.

Temporary revert, as the above patch contains some python code requiring at
least version 3.10, when the minimum required by LLVM is 3.8.
…1182)

I was observing segfaults at executable exit in the rtsan instrumented
unit tests. Bisecting the offending test led to observing that this test
is not using our safe test fixture for anything involving a file
descriptor. Changing to use the fixture eliminated the segfault on exit.
…48713)

`LocationDescription` contains both the insertion point and the debug
location. When `LocationDescription` is available, it is better to use
`updateToLocation` which will update both. This PR replaces
`restoreIP(Loc.IP)` with `updateToLocation(Loc)` as former may not
update debug location in all cases.

I am not checking the return value of `updateToLocation` because that is
checked just a few lines above in all cases and we would have returned
early if it failed.
Added crash on nullptr to mbstowcs

---------

Co-authored-by: Sriya Pratipati <[email protected]>
)

Fixed build dependencies for pthread_barrier_t (add __barrier_type to
cmake dependencies)
…sts (llvm#150867)

There is a pattern that rewrites
elementwise_op(broadcast(x1 : T to U), broadcast(x2 : T to U), ...) to
broadcast(elementwise_op(x1, x2, ...) : T to U).

This pattern did not, however, account for the case where a broadcast
constant is represented as a SplatElementsAttr, which can safely be
reshaped or scalarized but is not a `vector.broadcast` or `vector.splat`
operation.

This patch fixes this oversight, prenting premature broadcasting.

This did result in the need to update some linalg dialect tests, which
now feature a less-broadcast computation and/or more constant folding.
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64
   - DS_ADD_F64

Co-authored-by: Konstantin Zhuravlyov <Konstantin [email protected]>
This change implements correct lowering of function aliases to the LLVM
dialect.
…r convs (llvm#149576)

This PR fixes the computation of padded shapes for convolution-style
affine maps (e.g., d0 + d1) in `PadTilingInterface`. Previously, the
codes used the direct sum of loop upper bounds, leading to over-padding.
For example, the following `conv_2d_nhwc_fhwc` op, if only padding the c
dimensions to multiples of 16, it also incorrectly pads the convolved
dimensions and generates the wrong input shape as:

```
%padded = tensor.pad %arg0 low[0, 0, 0, 0] high[0, 1, 1, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
  tensor.yield %cst : f32
} : tensor<1x16x16x4xf32> to tensor<1x17x17x16xf32>
%padded_0 = tensor.pad %arg1 low[0, 0, 0, 0] high[0, 0, 0, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
  tensor.yield %cst : f32
} : tensor<16x3x3x4xf32> to tensor<16x3x3x16xf32>
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%padded, %padded_0 : tensor<1x17x17x16xf32>, tensor<16x3x3x16xf32>) outs(%arg2 : tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
return %0 : tensor<1x14x14x16xf32>
```

The new implementation uses the maximum accessed index as the input for
affine map and then adds 1 after aggregating all the terms to get the
final padded size. This fixed
llvm#148679.
)

Cygwin and MinGW share the auto import behavior that could result in
__stack_check_guard being non-dso-local. Allow windres to assume a
Cygwin target as well as a MinGW one, so defines like _WIN32 would not
be present on Cygwin.
…m#149637)

The Cygwin target is generally very similar to the MinGW target. The
default auto-import behavior, the default calling convention, the
`.dll.a` import library extension, the `__GXX_TYPEINFO_EQUALITY_INLINE`
pre-define by `g++`, and the long double configuration.

Co-authored-by: Mateusz Mikuła <[email protected]>
…m#151042)

D16 pesudo instructions are introduced in true16 mode to represet a D16
load/store. In MC lowering, the pesudo instructions are lowered to the
corresponding D16 Lo/Hi MC Inst respecting the register allocation.

However, the pesudo instruction has size 0 and cause an issue in the
Inst size estimation. Use D16 Lo when calculating inst size
llvm#149614)

When OpenACC is enabled and Fortran loops are annotated with `acc loop`,
they are lowered to `acc.loop` operation. And rest of the contained
loops use the normal FIR lowering path.

Hovever, the OpenACC specification has special provisions related to
contained loops and their induction variable. In order to adhere to
this, we convert all valid contained loops to `acc.loop` in order to
store this information appropriately.

The provisions in the spec that motivated this change (line numbers are
from OpenACC 3.4):
- 1353 Loop variables in Fortran do statements within a compute
construct are predetermined to be private to the thread that executes
the loop.
- 3783 When do concurrent appears without a loop construct in a kernels
construct it is treated as if it is annotated with loop auto. If it
appears in a parallel construct or an accelerator routine then it is
treated as if it is annotated with loop independent.

By valid loops - we convert do loops and do concurrent loops which have
induction variable. Loops which are unstructured are not handled.
Extend support in LLDB for WebAssembly. This PR adds a new Process
plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly
targets. It adds support for WebAssembly's memory model with separate
address spaces, and the ability to fetch the call stack from the
WebAssembly runtime.

I have tested this change with the WebAssembly Micro Runtime (WAMR,
https://github.com/bytecodealliance/wasm-micro-runtime) which implements
a GDB debug stub and supports the qWasmCallStack packet.

```
(lldb) process connect --plugin wasm connect://localhost:4567
Process 1 stopped
* thread llvm#1, name = 'nobody', stop reason = trace
    frame #0: 0x40000000000001ad
wasm32_args.wasm`main:
->  0x40000000000001ad <+3>:  global.get 0
    0x40000000000001b3 <+9>:  i32.const 16
    0x40000000000001b5 <+11>: i32.sub
    0x40000000000001b6 <+12>: local.set 0
(lldb) b add
Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c
(lldb) c
Process 1 resuming
Process 1 stopped
* thread llvm#1, name = 'nobody', stop reason = breakpoint 1.1
    frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
   1    int
   2    add(int a, int b)
   3    {
-> 4        return a + b;
   5    }
   6
   7    int
(lldb) bt
* thread llvm#1, name = 'nobody', stop reason = breakpoint 1.1
  * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
    frame llvm#1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12
    frame llvm#2: 0x40000000000001fe wasm32_args.wasm
```

This PR is based on an unmerged patch from Paolo Severini:
https://reviews.llvm.org/D78801. I intentionally stuck to the
foundations to keep this PR small. I have more PRs in the pipeline to
support the other features/packets.

My motivation for supporting Wasm is to support debugging Swift compiled
to WebAssembly:
https://www.swift.org/documentation/articles/wasm-getting-started.html
…, X, and(B,C)) (llvm#141733)

## Description
<!--- Title/Description will be Subject/Body of commit message.      -->
<!--- Please be concise and limit the subject line to 50 characters, -->
<!--- and wrap the Description at 72 characters.                     -->
<!--- Describe why this is required, what problem it solves.         -->
Adds support for ternary equivalent operations of the form `ternary(A,
X, and(B,C))` where `X=[xor(B,C)| nor(B,C)| eqv(B,C)| not(B)| not(C)]`.

List of `xxeval` equivalent ternary operations added and the
corresponding `imm` value required:

Ternary Operator| Imm Value
--|--
ternary(A,  xor(B,C), and(B,C))	| 22
ternary(A,  nor(B,C), and(B,C))	| 24
ternary(A,  eqv(B,C), and(B,C))	| 25
ternary(A,  not(C), and(B,C))	| 26
ternary(A,  not(B), and(B,C))	| 28

eg.  `xxeval XT,XA,XB,XC,22` 
- performs `XA ? xor(XB, XC) : and(XB,XC)`and places the result in `XT`.

Co-authored-by: Tony Varghese <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.